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Abstract 



We consider the prisoner's dilemma being played repeatedly on a dynamic network, 
where agents may choose their actions as well as their co-players. This leads to 
co-evolution of network structure and strategy patterns of the players. Individual 
decisions are made fully rationally and are based on local information only. They 
are made such that links to defecting agents are resolved and that cooperating 
agents build up new links. The exact form of the updating scheme is motivated 
by profit maximization and not by imitation. If players update their decisions in 
a synchronized way the system exhibits oscillatory dynamics: Periods of growing 
cooperation (and total linkage) alternate with periods of increasing defection. The 
cyclical behavior is reduced and the system stabilizes at significant total cooper- 
ation levels when players are less synchronized. In this regime we find emergent 
network structures resembling 'complex' and hierarchical topology. The exponent 
of the power-law degree distribution (7 ~ 8.6) perfectly matches empirical results 
of human communication networks. 
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1 Introduction 



The recent years have seen a drastic increase of interest in understanding the 
emergence of complex structures in nature and society. In this context, net- 
work theory has played an important role because it provides a topological 
substrate of discrete real- world interactions [Tf2"] . The science of networks ba- 
sically follows two main lines of research: On the one hand the formation of 
network structures is studied involving predominatly topological parameters 
(e.g. preferential attachement pQ). On the other hand, dynamical processes on 
networks with fixed topology have been studied, see e.g. [3J- In this context, 
also game theoretic models - in particular the prosners' dilemma - have been 
analyzed on distinct network topologies. So far not much work has focused on 
the co-evolution of topological and internal degrees of freedom, see however [I] 
for a quite general starting point. In the article we want to specifically address 
this matter by discussing a model where internal and topological degrees of 
freedom mutually influence each other. The model is based on the prisoner's 
dilemma (PD) [5] - one of the most impressive ways of illustrating situations 
of human interactions where mutual trust is beneficial, but egotism leads to a 
breach of promise. The fundamental interest in the PD arises from its applica- 
bility in a variety of fields, ranging from physics and biology to economics and 
finance [GIlTIS] . It is of particular interest in constitutional economics [9 rUfTT] . 

The central point in the PD dilemma is the payoff matrix, whose specific form 
reads for the payoff of one of two players, say player i, 



Each player has two options: she can defect (D) or cooperate (C). Mutual 
cooperation yields the highest total payoff - giving each of the players an 
equal payoff of R (reward); this is the optimal strategy when seen from a 
'global' point of view. If one of the players defects while the other cooperates, 
the highest attainable individual payoff - i.e. the temptation, T - goes to the 
defector and the cooperator receives the lowest possible payoff, the sucker's 
payoff, S. The cooperator would have been better off if he would have defected, 
thus receiving the payoff I. In a one-shot game, the dilemma holds as long 
as the entries of the payoff matrix, Eq. ([!]), satisfy S < I < R < T and 



Much research has concentrated on spatial aspects of this game, initially in- 
troduced in the pioneering work of Nowak & May [12] . In their work, players 
are located on a square lattice and play repeatedly (!) with their neigbours. 
Cooperation is made possible by the assumption, that every agent imitates his 





2R>T + S. 
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neighbours in such a way, that they synchronously choose the actions of the 
neighbours who got the highest payoff in the last turn. This specific rule of 
evolution leads to non-trivial complex spatio-temporal dynamics. It has been 
noted that the model is seriously troubled by the fact that an asynchronous 
update of strategies leads to the break-down of cooperation [13J. 

Carrying the discussion to more complex structures, the intial model of Nowak 
& May and slight adaptions thereof have been extensively studied under the 
aspect of different interaction topologies, see [13] for a review of recent devel- 
opments. In [15J it was shown, that for a variety of dilemmas (including the 
PD), heterogenous networks (e.g. scale-free networks) favor the emergence of 
cooperation. The role of hierarchical lattices was elaborately discussed in |16j . 
Interesting aspects of the PD on random graphs have been analyzed in |17j ; 
the role of small world networks was tackled in JT8J. Effects of entries in the 
payoff matrix and addition of noise have been examined on different types of 
two-dimensional lattices [19] . Other topology related topics, such as the role of 
an 'influential node' [20] and optional participation [21] have been examined 
as well. In essence, a vast number of possible topologies and formulations has 
been studied. However the networks are static and do not dynamically evolve. 

It is important to note that internal sanctions (refusal/termination of links) 
and positive feedback mechanisms ('preferential' choice of cooperating agents) 
are both directly related to variability of the underlying network and may 
play an important role in real-life situations. Few, but promising works have 
brought forward research towards this end in the recent years: In [22], players 
keep a running average of payoffs obtained from each other player in a sim- 
ulated tournament. These averages effectively determine whom to approach 
and whom to accept as co-player in the future. The strategies are basically 
determined by the 'genetic code' of the players and altered by crossover and 
mutation during the tournament. In [23J it was examined how preferential 
partner selection influences the performance of fixed strategies, thus serving 
as a starting-point for models in which players can also choose their strategy. 
Recently, results where the evolution of strategies is driven by imitation and 
coupled with evolution of the interaction network have been presented [23] 
(see [25] for a discussion of sociological aspects). Keeping the total number of 
links fixed, the authors found that the system may reach a steady state where 
agents predominantly cooperate. 

In the present paper, we want to study the formulation of the prisoner's 
dilemma including both: network dynamics and choice of actions. Close to the 
'original formulation' of the PD rationality (not imitation) will be the basis of 
individual decisions. We also keep information-horizons local, thus basing our 
model-dynamics on a quite strict interpretation of the homo oeconomicus. We 
show that the dilemma of overall defection can be overcome despite rational- 
ity, thus resolving the prisoner's dilemma: Even within a population of selfish 
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agents (who maximize their expected payoffs for the next turn) cooperation 
emerges without the necessity of external rules, imitative behaviour or the 
introduction of strategies. As no memory of the agents is involved, our model 
also remains also temporarily local and incorporates co-evolutionary dynam- 
ics which display interesting collective phenomena and nonlinear dynamics. 
It is especially intriguing that the resulting cooperation networks show the 
same power-law exponents as those found in real communication networks, in 
particular in mobilephone-call networks |26j . 

The paper is structured as follows: The model is presented in Section 2. In 
Section 3 results based on a numerical implementation of the model are pre- 
sented. The influence of model parameters are discussed as is the structure of 
the networks obtained. Finally, a discussion of the main results is provided in 
Section 4. 



2 The model 



We consider a network with a fixed number of N agents/players with a variable 
number L of links between them, where linked agents play the PD-game. By 
Ni(t) we denote the set of Zj(t) neighbours of agent i on the network at timestep 
t. The actions of the agents are encoded in two-dimensional unit-vectors, i.e. 
a-iit) = a c = (1,0), if agent i cooperates and a,i(t) = a d = (0,1), if agent i 
defects. We assume that agent i has full knowledge about the chosen strategy 
aj(t) and the payoff Pj(t) of each of her neighbours j, but no knowledge 
about these quantities for all the other players (local information). At each 
timestep, agent i performs an update of his action and local neighbourhood 
with probability p u . Thus, decisions for chosing neighbours and the actions 
are made simultaneously by an average number of N u P date « p u N to t agents. 
For p u = 1, the decisions of the agents are fully synchronized, whereas p u < 1 
automatically includes the important case of asynchronous updates [13]. Once 
chosen for update, agent i performs maximization of her expected payoff in 
the next round, i.e. she maximizes 

P t (t + 1) = a i(t + l ) p iMt + !) • (2) 

jeNi(t+l) 



Here, Py denotes the payoff matrix, Eq. ([I]), and aj(t+l) the expected action of 
neighbour j. The preceding action is taken as reasonable expectation valud"*"! 
aj(t + 1) = a,j(t). In Eq. ([2]), profit-maximization of agent i is performed by 

1 This step may be critized as being inductive. However, for p u < 1 player i will 
know that his neighbours will keep their strategy on average. Thus our argument is 
inductive only for p u = 1. 
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adjusting the future action dj(t+l) as well as the future (expected) neighbour- 
hood Ni(t+ 1). At this point, further substantiation of the network-dynamics 
is inevitable. In the following we will specifiy detailed rules concerning the 
individual updates N(t) N(t + 1). 

First, we assume that agents cancel a link if the payoff with the respective 
co-player is smaller than, or equal to zero, i.e. if the link does not pay off . A 
unilateral decision for link-cancellation will suffice to break off of a relation- 
ship. The maximum number of links agents may cancel in one period is limited 
by a model-parameter a; the neighbourhood after cancellation of a links is 
depicted by Ni(t + 1; a). The parameter a models the maximum number of 
relations (to defectors) one is willing to quit per timestep, thus describing the 
'sanction-potential' in the system. 

As far as the creation of new links is concerned, we conceive that only agents 
who have chosen to cooperate have the possibility to establish new links. We 
make this assumption for 2 realistic reasons: On the one hand, one could 
assume that the players enter commitments about their future strategies (a 
typical element of cooperative game theory). Then, their strategies will prac- 
tically be known in advance by potential co-players and it is reasonable that 
links offered by agents who anounce to defect will not be accepted. On the 
other hand, it is tempting to conjecture that a mechanism of 'recommenda- 
tion' governs decisions of acceptance or refusal of new link-proposals: It is only 
rational (and we have assumed rationality of the players) that next-to-nearest 
neighbours of i will 'poll' neighbours in common with i to get an idea about 
z's strategy and that they would only accept a link offered by i in case % is 
'known' to cooperate. In this framework it is natural that unilateral decision 
to establish a new connection will suffice, i.e. if player i cooperates and de- 
cides to link with player j, the link will be accepted with certainty as j has 
no reason to refuse (the new link will allow him to pocket in a riskless profit). 
Together with the unilateralism in cancellation of links this makes complicated 
'matching' of the agents' decisions unnecessary. We limit the maximum num- 
ber of links which may be established per timestep by the parameter /3; thus 
we incorporate some constraint on resources which can be spent to establish 
new linkg[f]. The new neighborhood associated to establishment of (3 links is 
depicted by Ni(t + 1; (3). In summary, the parameters a and (3 can also be seen 
as 'agility' or willingness to change partners upon new information. 

With the given specification of network-dynamics, we can formulate the maxi- 
mization of the payoff, Eq. (jSJ), in the following way: Each of the ]\f u P date agents 



2 In future work, it could be interesting to study the effect of costs by making (3 
proportional to some measure of payoff. 
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calculates the expected payoff in case of cooperation, 

P?{t + 1) = E o**W*)= E atPiM^+P^it + l;?) (3) 

jeAH=(t+l) jeNi(t+l;a) 

and the expected payoff in case of defection, 

Pf(t + 1)= E af*W*)= E -/',(') (4) 

jeNf(t+l) jeNi(t+l;a) 

and will choose the strategy with the higher payoff (if payoffs are equal the 
strategy is chosen at random). Nf(t + 1) and Nf{t + 1) denote the 'expected' 
neighbourhoods for the two cases. For cooperation they can be written as 
N?(t + 1) = Ni(t + l;a)UNi(t + l;P) and for defection as Nf(t) = jV f (t + l;a) 
(A U B denotes the union of sets A and B). 

Now, the missing piece for determining the action in the next timestep is 
the estimation of the additional expected payoff P^ dd {t + 1; j3), which can be 
acquired due to new links. To do so, each agent performs an evaluation of the 
neighborhood only using information about nearest neighbors, as illustrated in 
Figure [IJ Agent % first evaluates her payoff obtained from the set of neighbours 
he and j have in common, denoted by Pfj .He can then subtract this payoff 
from j's total payoff, Pj(t), to obtain an approximation of the profit j gains 
from the neighbours they do not have in common, denoted by N^ffi. Weighting 

this estimate with the fraction and averaging over all neighbours Ni(t), 

agent % obtains the expected additional payoff he receives when establishing 
(3 random links to next-to-nearest neighbours 

P^ + 1;/3) = J- E ®( P M(PAt)-PN {ij) (t))]JL > (5) 

jeNi(t) (M) 

which completes the model. Regarding the sum in Eq. (jSJ), we found it realistic 
to confine the summation over Ni(t) to a summation over a subset of Ni(t), 
namely to the first-nearest neighbours of i who have a payoff Pj(t) > 0: It 
would be barely rational for an agent to build up links for which he knows 
that the expected payoff is negative on average. This is also why Zj(t) denotes 
the effective number of neighbours contributing in the sum. Although the 
numerical results given in the next section refer to this specific formulation 
of the model, dropping the 0(Pj(t))-term practically gives the same results 
(we will discuss the miniscule effect of this term below). We also note, that 
Eq. ([5]) gives the highest possible value only if all next-to-nearest neighbours 
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Fig. 1. Illustration for the notation of variables characterizing the neighborhood of 
players i and j. The players have two neighbors in common; the corresponding set 
is denoted by N^jy Agent j has NVfjl neighbors not in common with i which are 
potential new coplayers for i. The payoff player i obtains from the set of neighbours 
N(i j\ is denoted P<j 

cooperateGD- 

After the evaluation of Eqs. ([3]) and (jl]) has taken place, the strategies of 
the N u P date agents are updated at the end of each timestep and links are 
removed and built up. We have already discussed that there is no need for 
a complicated 'matching'-procedure, as dynamics are governed by unilateral 
decisions (of course, it will also happen that two players both decide to play 
with each other in the next turn). Finally we note, that an agent is randomly 
wired with one link into the network if he happens to loose all his links during 
time evolution of the system. 



3 Results and Discussion 

In the following we discuss the model in dependence of the three main pa- 
rameters - p u , a and (3. As a starting point for our simulations, we generated 
random networks [27] of size of A^ = 10 3 . Our simulations have clearly shown 
that the dynamics and the emerging interaction networks do not depend on the 
initial configurations: The system converges relatively fast towards its attrac- 
tors (repulsors). Simulations have been typically performed for 10 5 timesteps, 
providing accurate statistics. If not stated otherwise, the payoff matrix was 
chosen in the specific form given in Eq. ([1]). We also studied the effect of 
changing the entries in Py which will be discussed below. 



It is easy to see that, if j does not cooperate, i can adjust for this via calculating 
his payoff on N(i t j) assuming defection and correcting Pj(t)—P^ _ for the difference 
in the payoff matrix between defection and cooperation. 
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Fig. 2. (a) Time-series of the fraction of cooperating agents f c for different values 
of the update-probability p u (p u = 1.0, p u = 0.5, p u = 0.1), showing decreasing 
regularity, (b) Time-series of the average number of links (Zj) per agent for the same 
values of the update-probability p u . Clearly, the time average (Zj) for p u = 0.1 and 
p u = 0.5 is considerably above the case for p u = 1, indicating the stabilisation of 
the corresponding network, (c) Comparing the time-series pertaining to p u = 0.6 of 
the model to p r u and = 1.0 of a 'random' formulation of the model described in the 
text. The inset shows the empirical distribution of both time-series. 



3.1 Properties of cooperation time-series 

To discuss basic properties of the time-series, Figure [2] depicts the fraction 
of cooperating agents (denoted by f c (t) = N c (t)/N) and the average linkage 
(h{t)) = L(t)/N of a particular simulation for a = (3 = 6 and various values 
of p u . For p u = 1, oscillations with a comparably high amplitude are observed. 
Also the average linkage oscillates strongly between a minimum of about 4 
and a maximum of 13 links per agent. The reason for the cyclical behavior of 
the system can be easily understood: In the states corresponding to low f c , 
linkage has been reduced to an extent motivating the agents to build up links 
again. In configurations with high f c , the majority of agents has collectively 
acquired a state of maximum linkage, meaning there is no more motivation to 
cooperate in our rational setting. This can be easily understood since agent i 
only cooperates as long as the condition 

pa dd{t mR + ld{t)S > mT + ld{t)I (6) 

is fulfilled, i.e. as long as the payoff expected from cooperation is larger than 
the payoff expected for defection. Here, kit) = Zf(t) + lf(t), where Z?(t) de- 
notes the number of links to cooperating neighbours and if (t) the number of 
defecting neighbours. If all neighbours of agent i cooperate (If (i) = 0), he will 
defect as soon as kit) > P? dd /{T-R) where Pf d is bound from above by j3R, 
Therefore, for the parameters chosen here agents in a cooperative neighbor- 
hood will only cooperate as long as Zj(t) < 18. The observed rapidity of the 
oscillations becomes clear, when one considers that the agents may build up 
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Fig. 3. (a) States f c visited by the dynamics as a function of the update probability 
p u . p u is changed in discrete steps. For each p u , 500 consecutive states f c are plotted, 
(b) Range {f™ ax — f™ m ) of the oscillations as a function of p u , averaged over 500 
independent realizations of time-series. 



(3 = 6 links per move and therefore reach I™ -* comparatively fast. By lowering 
a and j3 the amplitudes reduce, as intuitively expected (not shown). 

3.2 Dependence on update-probability p u 

In reality, agents are not infinitely fast in assessing new information in their 
surrounding, as they need time to adopt and employ decisions. It is known 
from previous studies [13] that asynchronous update can strongly influence 
observed dynamics and level of cooperation. Considering this by lowering p u , 
the oscillations in the overall population are increasinlgy damped, indicating 
that the network is stabilized in comparison to the p u = 1 case (see Figure 
E]). In contrast to the rapid update mode at p u = 1, the range of f c exhibits a 
reduced span (about 12% of the overall population) and the average number 
of links per agent stabilizes at (Zj) « 13. Only an average of 0.5% of the agents 
have lost all their links at a given timestep (and are then randomly rewired). 
Decreasing p u allows for a mean-field approximation of U, denoted {k) m ^ , based 
on Eq. ([6]): If the number of cooperating agents does not oscillate too strongly, 
the additional payoff averaged over the neighbours can be roughly estimated 
to be (P°~ dd (t + 1)) ~ f3Rf c - Since for the specific form of payoff matrix chosen 
here T — R = I — S = 1 one can simply add l?(t) and lf(t) in Eq. and 
obtains {k)'" 1 ^ ~ 13 for p u = 0.6 (f c ~ 0.75). The actual observed average of 
(li) fa 12 is in agreement with this approximation. 

We now investigate the dependence of f c on the chosen update-probability p u 
more closely: In Figure [3^ we show the dependence of the f c states visited in 
dependence on p u . The plot shows 500 realizations of f c (y-axis) for different 
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values of p u (taken after discarding the first 10 3 steps for each p u ). In Figure 
[3^, p u is thus changed in discrete steps of Ap u = 0.0025. One recognizes that 
the / c -states are not trivially visited by the system: Although the system is 
oscillating strongly and periodically at p u = 1 because of deterministic aspects 
in the evolution, all the points within the amplitude of the oscillations are 
visited due to the randomness introduced at various points (e.g. randomness 
in the chosen next-to-nearest neighbours, randomness in the strategy chosen 
if expected payoffs are equal for cooperation and defection, etc.). Slightly 
lowering p u reveals interesting effects on the configuration of the limit cycle 
(see the inset of Fig. [3h): One recognizes that the limit cycle first comprises 
3 main points between which the system 'hops' (i.e. it changes from the state 
with high f c to the state with low f c with one intermediate step and vice versa), 
then 4 points and then again 3 points. For some values of p u e[0.965, 0.995] 
certain states between f™ ax and f™ m are never reached. Between p u = 0.7 
and p u = 0.6 the most frequently visited states change from f™ ax and /™ n 
to the average value of f c (the limit cycle vanishes). This is also evident from 
plotting f c (t + 1) against f c (t) in Figure HI One recognices that decreasing p u 
to 0.7 leads to a smaller gap in the attractor; at p u = 0.6 the gap has vanished 
(not shown). Further decrease in p u narrows the space filled by the trajectory 
of the system (see Figure IU p u = 0.1). We have also determined the average 
of the double amplitude (f™ ax — f™ m ) of the oscillations, see Figure [3b. For 
each value of p u investigated we simulated various realizations of time-series 
of length T = 10 3 (discarding the first 10 3 steps) and averaged the obtained 
ranges over these realizations. As Figure [3b shows, the range of oscillations 
reduces for lower a = (3, as expected. As Figure [3b shows, we did not find 
that the amplitude follows a simple scaling function along the bifurcation, thus 
suggesting that the observed change in dynamics is more complicated than a 
simple Hopf-bifurcation. 

As far as the overall dependence of the mean of f c on p u is concerned, Figure 
shows (f c ) for different values of p u . For a = j3 = 6, the curve exhibits a 
maximum at p u ~ 0.6, which can be intuitively understood as a trade-off effect 
between two aspects: On the one hand decreasing synchronization improves 
stability and efficiency in the system as the estimates for the future actions of 
neighbours become better and overreaction (extreme oscillations) is reduced. 
On the other hand, decreasing p u reduces efficiency via reducing the reaction 
of agents to the changes of their neighbours' strategies. Turning towards the 
aspect of decreasing a and (3, the system is stabilized at a higher value of p u , 
as the gray line corresponding to a = (3 = 4 has its maximum at p u ~ 0.7. 
This can be intuitively understood: As link-dynamics are slowed down for 
a = (3 = 4 agents have to be able to react slightly faster to employ efficient 
internal sanctions via cancellation of links. Another intuitive reason lies in the 
fact that lowering a = (3 effectively reduces the amplitude of oscillations. One 
would therefore anticipate that the limit cycle vanishes for a higher value of 
p u - We also note that these macroscopic dynamics results in a nice aspect: 
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Fig. 4. Visualization of 'attractors' in the space {f c (t), /c(^+ 1)} f° r different values 
of p u (p u = 0.1, p u = 0.7, p u = 0.8) and a = /3 = 6. 




Fig. 5. (a) Average number of cooperating agents (f c ) as a function of update-prob- 
ability p u for a = j3 = 6 and a = j3 = 4. (Taking averages of highly correlated 
time-series is to a certain extent problematic, which is why the line is drawn broken 
in the corresponding regime. To guide the eye, actual values (circles) have been in- 
terpolated by a cubic spline.) (b) correlation length A determined by an exponential 
fit to the auto-correlation function of Af c (t) as given in Eq. ([7]). For very small p u 
the exponential fit becomes problematic because the process becomes practically 
uncorrelated, i.e. the correlation function turns into a Dirac delta function (the 
correlation length is not shown for p u < 0.3). 



If agents are too eager (too fast, high update probability p u ) to optimize 
their neighbourhood, the global level of cooperation becomes suboptimal when 
compared to slow adaption ('sloppyness', low p u ): If agents optimize their 
situation too 'fast' everybody is worse off on average. 

We have further taken a closer look at the correlations in the system via the 
auto-correlation function of the first differences of / c (t), given by A/ c (£) = 
fc{t) ~ fc(t — 1)- The envelope of the auto-correlation function is fitted to an 
exponential with inverse correlation length A, i.e. 



(A/ c (* + r)A/ c (t))~e 



(7) 



for t > 0. Values of A for different update-probabilities are summarized in 
Figure [5b. As expected, between p u = 1 and p u = 0.8 correlation is very 
strong. Lowering p u below 0.8 leads to an decrease in the correlation, where 
the exponential fit becomes more and more problematic. We found that for 
p u < 0.3, the correlation function resembles the shape of a Dirac delta function 
and the exponential fit loses sensibility. 

Let us now discuss the important point of how sensitive the results are to 
changes in the specific dynamics chosen. Towards this end, we have compared 
results of the model in the form presented here with a formulation without 
the Q(j)j(t)) term in Eq. (jSJ). This variation only leads to slight changes in the 
oscillatory states of the system at high p u (giving a slightly lower (/ c )). For 
lower p u the difference between the two formulations became negligible (not 
shown). A more massive change in the dynamics occurs when reformulating 
the model via dropping the specific assumption of locality, i.e. the assump- 
tion of building up new links only to next-to-nearest neighbours. To do so, 
we implemented a variant in which j3 new links to a set of random nodes 
N rand (t + 1; /3) in the system are established. The agents now know the strate- 
gies of random players in the system and the payoff of additional links is 
determined by p°- dd > vi (t + 1) = ^2ffrandu +1 .m a^Pijaj(t). Figured shows the 
respective dynamics of f c (t): For p™ nd = \ we recover oscillatory behaviour. 
Compared to the p u = 1 case of the initial model we observe a consider- 
ably reduced amplitude. This is expected since N[ and (t + l;/3) will always 
contain defectors, whereas the initial Ni(t + 1) mainly consists of coopera- 
tors (as the next-to-nearest neighbours are typically cooperators since links to 
defectors are immediately cancelled). This can be compared to the expected 
payoff being reduced via lowering p u in the initial model, which results in 
more defecting next-to-nearest neighbours (as sanctions are not applied im- 
mediately). Apparently, Figure suggests that these two effects are nearly 
analogous when choosing the update-probabilities appropriately. Apart from 
the observation that the average level of cooperation in the random variant is 
a little bit higher than in the initial model the time-series match quite well (see 
the empirical distribution in the inset). Concerning the slightly higher mean 
of the random variant we conjecture that this is due to a welfare effect stem- 
ming from the elimination of imperfect information and from knowledge about 
global topology. Although the 'random variant' of the model provides a closer 
understanding of the model-dynamics, we will continue with the discussion of 
the initial model since it is much more realistic. 



3. 3 Impact of 'agility ' a and (3 and the influence of payoff matrix elements 

To quantitatively describe the influence of the parameters a and (3 on f c , we 
kept both parameters equal and performed simulations for values ranging from 
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Fig. 6. (a) Influence of parameters a and (3 on the ratio of cooperating agents in a 
population of 10 3 players. The two parameters have been kept equal. Simulations 
were done for p u = 0.5. (b) Influence of the payoff matrix element R on (/ c ), where 
T = 1 + R and the other elements remain unchanged. 



a = j3 = ltoa = j3 = 7. Results are summarized in Figured, for p u = 0.5 
and for the payoff matrix given in Eq. ([1]). Only for a parametrization of 
a = j3 = 1, the majority of agents is defecting. One recovers very unstable 
networks of low average connectivity of approximately 1.3 links per agent 
in this case. For higher values of a = /3, the system gets initially stabilized 
and the increase of f c flattens as a increases. This can be understood as the 
parameter a has reached a value, where cooperating agents are able to cancel 
virtually all the links they have with defecting agents. In other words, the 
internal sanction-potential of the system has reached a maximum. 

Clearly, not only the parameters a, f3 and p u , but also the entries in the payoff 
matrix influence the dynamics obtained within the presented model. In 
this context, the entry for constellation / (equal to in Eq. (TjQ)) is of fun- 
damental importance: When chosen such that defecting agents keep the links 
between one and another, a collapse of cooperation in the system is observed. 
On the other hand, increasing the values for temptation T and reward R, while 
holding their difference T — R constant, increases the average number of co- 
operating agents. This is expected as the relative advantage of defectors over 
cooperators is reduced. The corresponding interrelation is quantitatively cap- 
tured in Figure for two values of p u {p u = 0.5 and p u = 0.1), a = j3 = 6 and 
T = 1 + R. Again, we observe a saturation effect similar to the one found when 
varying a and /3. Additionally, for 'low' values of R, the update-probability 
apparently has comparatively larger influence on (f c ), whereas in the region of 
saturation, the increase caused by 'switching' between p u = 0.1 and p u = 0.5 
is decreasing more and more. 
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Fig. 7. Degree distributions averaged over time series with T = 10 5 , N = 10 3 , 
a = (3 = 6 for two different values of the update probability p u (0.1 and 0.6). The 
tail for p u = 0.6 may be fitted by a power-law P(k) ~ /c -7 with 7 ~ 8.6. 

3.4 Emerging network topology 

The networks obtained as snap-shots of the dynamics exhibit interesting prop- 
erties, resembling features of real-world networks. We confine ourselves to the 
discussion of the two most widely used quantities in the analysis of networks, 
the degree distribution and the cluster-coefficient [Tf2] . Figure [7J shows the 
degree distribution P{k) in a double-logarithmic plot for two values of p u . To 
improve the accuracy of the plot, degree distributions of networks at 10 3 dif- 
ferent times have been averaged. The correlation in the time-series has been 
taken care of by using time- intervals of inverse correlation length. Figure [7J de- 
picts the degree-distribution for the N = 10 3 case, for p u = 0.1 and p u = 0.6. 
For p u = 0.1, the power-law fit shown is slightly inadequate and indicates 
a function somewhere between a power-law and an exponential regime. For 
p u = 0.6, the double-logarithmic plot indicates that the tail of the distribution 
can be expressed as a scaling law P(k) ~ /c -7 with 7 ~ 8.6. This shows that the 
network is clearly not random, but possesses self- similar structure. Lowering 
p u , the network loses structure and becomes more random. We repeated the 
analysis for larger networks (N = 5000) and did not obtain different results. 

The cluster coefficient C, defined as the average of all individual cluster coef- 
ficients Ci, provides a quantitative measure for cliques (i.e. circles of acquain- 
tances in the network in which every member knows every other member) in 
the network. The individual cluster coefficient of a node i is defined as 

q = 2Ei , (8) 
ki{k% 1) 

where E$ are the number of existing edges between z's neighbours and ki(ki — 
l)/2 gives the highest possible value of edges between the neighbours. As 
the expected total number of edges in a random graph can be obtained via 
hot = p(N(N — l))/2, one can compare the cluster coefficient obtained from 
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Fig. 8. (a) Time averages (C) of the cluster coefficient for different values of p u 
and a = j3 = 6. The insert shows the cluster coefficients of equivalent random 
graphs, denoted by (C ran d}- (C ra nd) is decreasing strongly for p u > 0.7 because the 
average number of links in the system is dropping considerably in this regime, (b) 
Individual cluster coefficients Q plotted against individual degree ki for p u = 0.6 
and a = (3 = 6. The insert shows the tail of the corresponding distribution in a 
double- logarithmic plot, where the individual Cj's have been averaged. The slope 
of the interpolating line is 5 ~ — 0.4. 



given networks to those of equivalent random networks. Figure [8^ shows the 
average cluster coefficients from simulations at different values of p u , the other 
parameters being kept fixed (a = (3 = 6). For comparison, also the cluster 
coefficients of equivalent random graphs are shown (C ran d — p — (k)/N). 
Obviously, the observed networks exhibit large clustering- coefficients when 
compared to those of equivalent random graphs. This is not surprising, as our 
mechanism of linkage directly favours the formation of cliques. When taking a 
look at the dependence of the cluster-coefficient on p u , a minimum at p u = 0.6 
can be identified. Interestingly, this minimum corresponds to the maximum of 
the number of cooperating agents in Figure [5^, and to the value of p u where 
the degree distribution was best fitted with a power-law. 

Plotting the cluster-coefficients Ci of individual agents vs. their deg ree k{ al- 
lows for a more sophisticated analysis of network structure. The corresponding 
plot is shown in Figure Eb, where each point corresponds to a pair {k iy Cj}. 
The points have been sampled from 100 different networks. Based on this data, 
we have calculated the mean cluster-coefficient in dependence of the degree of 
the nodes, denoted by (Ci)(k). The tail of this distribution is shown in double 
logarithmic scale in the inset of Figure [8}d. Clearly, there is a non-random 
relationship between cluster-coefficient and degree. The underlying networks 
exhibit (complex) hierarchical organisation: For small degrees the mean clus- 
tering is much higher than for large degrees. We also evaluated the number 
of cooperating agents as a function of degree, finding that f c (k) grows with 
degree (not shown). This confirms our expectation that the cooperators are 
the ones who build up new links and at the same time do not suffer from 
loosing ties. 
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Finally, Figure [9] shows the average distribution of individual payoffs in the 
system for p u = 0.1 and p u = 0.6. Although the maximum of the distribution 
is at a higher payoff for p u = 0.1, the average payoff is higher for p u = 0.6 as 
the tail of the distribution is 'fatter' in this case. The inset shows the tails of 
the distribution in a semi-logarithmic plot, indicating that the tails are close 
to exponential. 

3. 5 Experimental evidence for cooperation networks 

As a proxy for a cooperation network of humans it is reasonable to consider 
telephone call networks. It is reasonable to assume that communication be- 
tween cooperating individuals will dominate the total number of calls, while 
non-cooperating individuals will avoid communication. There exists recent 
research on real mobilephone-call networks [26] . In this study, a power-law de- 
gree distribution with a characteristic exponent ^ moMe ~ 8.4 was found. It is 
obvious, that the exponent obtained within our model (7 ~ 8.6) shows close 
resemblance to this value. This suggests that our model captures dynamics 
of real-world networks and has some predictive value. We think that the ex- 
perimental procedure (temporally clearly limited measurements of networks) 
behind the data reported in is much more comparable to the averaging 
procedure in our simulations than the procedures behind many other investi- 
gations hitherto, which often involve effects of growth. 



4 Summary 

In this work, we have considered the prisoner's dilemma being played on dy- 
namic networks under the assumptions of rationality and strictly local infor- 
mation horizons of the agents. The novelty lies in the fact that links in the 
network are treated as a dynamical variable while - at the same time - we 
adopted an update-scheme based on profit-maximization and not on imitation. 
The network on which the game is played is thus an emergent structure, co- 
evolving with the configuration of strategy-space. Within this framework, rea- 
sonable assumptions about fully rational individual decisions lead to a model 
of network dynamics where defectors are effectively sanctionized in two ways: 
By implicitly being affected by link-cancellations and by explicitly not being 
able to establish new links as the players minimize potential losses by accept- 
ing only 'recommended' co-players. 

We showed that the dynamics implied are non-linear and lead to the emer- 
gence of cooperative behaviour even within a framework of rationality. More 
precisely, we observed distinct modes in the model: In the case of high synchro- 
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Fig. 9. Average distribution of the individual payoffs for p u = 0.1 (broken line) and 
p u = 0.6 (solid line) (a = (3 = 6). The insert shows the tails of the distribution in a 
semi- logarithmic plot. 

nization of the agents' decisions, significant oscillations of global parameters 
appear and much resources are wasted in collective movements. We have dis- 
cussed the dependence of the system on the control-parameter in this regime. 
For low synchronisation of the agents, randomness in the system and delay 
of the players reactions reduces cooperation. For regimes in between high 
and low synchronization, we showed that the system reaches an optimum, 
where network characteristics resemble those of complex networks, exhibiting 
clearly non-random properties like power-law degree distribution and hierar- 
chical clustering. Towards this end it is especially remarkable that our model 
predicts a rather high tail-exponent 7 = 8.6 of real world communication 
networks (compare with -y moWe = 8.4 in [2"B"]). 

It is interesting that oscillatory dynamics immanent for high synchronization 
have also been found in a spatial formulation of the prisoner's dilemma where 
participation in the game was voluntary |21j . Thus it seems that the cyclical 
dominance of the strategies found in [21] can be qualitatively confirmed even 
within the picture of (highly) dynamic networks. The fraction of cooperating 
agents in our model was found to be bound by rougly ~ 0.9 from above and 
by roughly /~ w 0.4 from below, showing a saturation regarding the studied 
parameters towards /+ = 0.9. This is above the level of cooperation found in 
the voluntary formulation of the PD [2T] and in the initial work of Nowak & 
May [12] , but below typical fractions found for the PD on variable networks 
with imitative behavior of agents |24j. It is not surprising that imitation on 
dynamic networks yields higher overall degree of cooperation than rationality 
since on fixed structures cooperation is sustainable for imitation but not (or 
much less) sustainable for rational settings. 

The current work may be extented in various directions: On the one hand, we 
expect that introduction of heterogeneity in the payoff matrix and the param- 
eters a and (3 (i.e. that these parameters take different values for the agents) 
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could lead to further interesting results. We also conjecture that coupling (5 
to some measure of payoff (fitness) of the individual agents should introduce 
some new, realistic effects. 
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